Troubleshoot Slow Insert in Availability Group database and high HADR_SYNC_COMMIT Waits issue

Introduction

In this lab, you will troubleshoot the reason for slow inserts within an availability group database and corresponding high HADR_SYNC_COMMIT waits. The goal is to identify the cause and fix the issue.

Objectives

At the end of this lab, you will be able to:

  • Understand the cause of the slow inserts within an Availability Group Database.
  • Fix the issue and resolve the slow performance.

Estimated Time

45 minutes

Logon Information

Use the following credentials to login into virtual environment

  • Username: corpnet\cluadmin
  • Password: Pa$$w0rd

Before starting the training module, we recommend that you launch the labs and give them some time to stabilize. Please be aware that sometimes the AG may be in a resolving state and AG Replicas may be in a disconnected state. This is a platform issue and should stabilize after a few minutes

Environment review

Before you begin with the first exercise in the lab, let's review the lab environment.

  • In the lab, you have one Domain Controller and 3 nodes + 1 client computer.

  • AlwaysOnN1 and AlwaysOnN2 nodes are in the primary Datacenter.

  • AlwaysOnN3 is in the secondary datacenter.

  • AlwaysOnN1 and AlwaysOnN2 are configured in synchronous commit mode

  • For this lab, both the datacenters are in the same subnet.

  • Each node has Windows Server 2022 O/S installed.

  • SQL Server 2022 Standalone instances are installed on all the 3 nodes (i.e. AlwaysOnN1, AlwaysOnN2, AlwaysOnN3).

  • SQL Service logon account details

    • SQL Service Logon account: corpnet\SQLSvc
    • Password: Pa$$w0rd

Back to the list of Labs

Exercise 1: Connect to the Primary Replica and execute the workload

In this exercise, you will learn how to Connect to the Primary Replica and execute the workload.


Tasks

  1. Connect to the Primary Replica and execute the workload

    Perform this task on the virtual machine AlwaysOnClient.

  2. Open SSMS and connect to AGCorpListen. This is the listener for the Availability Group deployed between AlwaysOnN1, AlwaysOnN2 and AlwaysOnN3 replicas for the database AdventureWorks.

  3. After connecting to the listener, review the primary and secondary replicas under the AGCorp Availability Group. In the below example, AlwaysOnN2 is the primary replica and AlwaysOnN1 and AlwaysOnN3 are the secondaries.

    Image0080a.png

  4. Open the script window and execute the SlowInsert_AG script.

    Located at C:\scripts\ on the AlwaysonClient Machine

  5. Script inserts 10,000 rows 10 times. It will report the average time it took to insert 10,000 rows in seconds and the average time it took to insert a single row in milliseconds.

    image0082a.png

  6. You will also find large no of tasks waiting on HADR_SYNC_COMMIT

  7. The duration of these inserts is a lot more than the expected duration. We need to find out what is causing the inserts to perform slowly, validate it and take remediate actions to fix the issue.

  8. Feel free to run the script many times as you try various options to troubleshoot and fix the issue.

Congratulations!

You have successfully completed this exercise. Click Next to advance to the next exercise.

Exercise 2: Identify the cause of the failure and try and fix the issue

In this exercise, you will learn how to Identify the cause of the failure and try and fix the issue.


Tasks

  1. Find the root cause using the guidelines and cheat sheet provided below.

Guidelines

  1. This is a non-guided activity and the attendees are expected to try and troubleshoot this issue on their own.

  2. You can use any resources (including the internet or your own scripts), to troubleshoot the issue.

  3. You can use the tools discussed in the first module to help troubleshoot the issue.

  4. The possible causes discussed earlier in the lesson can be used as guidance for troubleshooting.

  5. The instructor will discuss the troubleshooting steps, cause and solution in detail after this lab session.

  6. You might have to login directly on the individual nodes to troubleshoot the issue.

  7. Ask yourself the below questions:

    • Where do you start?
    • What logs will provide me additional insights?
    • What tools can I use to troubleshoot this issue?
    • Could one or more of the possible causes discussed earlier be the issue here?

Cheat Sheet

  1. Tools to help you troubleshoot the issue

    This is not a complete list of commands/tools to help troubleshoot this issue. There are various ways/methods/approaches to troubleshoot an issue. These commands/tools referenced here could be used to look up/identify useful information for this lab.

  2. To reset instance's waittype counter values

    SQL
    DBCC SQLPERF('sys.dm_os_wait_stats',CLEAR) GO

Congratulations!

You have successfully completed this exercise. Click Next to advance to the next lab.